{
 "cells": [
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "# LAB 5a:  Training Keras model on Cloud AI Platform.\n",
    "\n",
    "**Learning Objectives**\n",
    "\n",
    "1. Setup up the environment\n",
    "1. Create trainer module's task.py to hold hyperparameter argparsing code\n",
    "1. Create trainer module's model.py to hold Keras model code\n",
    "1. Run trainer module package locally\n",
    "1. Submit training job to Cloud AI Platform\n",
    "1. Submit hyperparameter tuning job to Cloud AI Platform\n",
    "\n",
    "\n",
    "## Introduction\n",
    "After having testing our training pipeline both locally and in the cloud on a susbset of the data, we can submit another (much larger) training job to the cloud. It is also a good idea to run a hyperparameter tuning job to make sure we have optimized the hyperparameters of our model. \n",
    "\n",
    "In this notebook, we'll be training our Keras model at scale using Cloud AI Platform.\n",
    "\n",
    "In this lab, we will set up the environment, create the trainer module's task.py to hold hyperparameter argparsing code, create the trainer module's model.py to hold Keras model code, run the trainer module package locally, submit a training job to Cloud AI Platform, and submit a hyperparameter tuning job to Cloud AI Platform.\n",
    "\n",
    "Each learning objective will correspond to a __#TODO__ in this student lab notebook -- try to complete this notebook first and then review the [solution notebook](../solutions/5a_train_keras_ai_platform_babyweight.ipynb)."
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "colab_type": "text",
    "id": "hJ7ByvoXzpVI"
   },
   "source": [
    "## Set up environment variables and load necessary libraries"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Import necessary libraries."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "import os"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### Lab Task #1: Set environment variables.\n",
    "\n",
    "Set environment variables so that we can use them throughout the entire lab. We will be using our project name for our bucket, so you only need to change your project and region."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "%%bash\n",
    "export PROJECT=$(gcloud config list project --format \"value(core.project)\")\n",
    "echo \"Your current GCP Project Name is: \"${PROJECT}"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "# TODO: Change these to try this notebook out\n",
    "PROJECT = \"cloud-training-demos\"  # Replace with your PROJECT\n",
    "BUCKET = PROJECT  # defaults to PROJECT\n",
    "REGION = \"us-central1\"  # Replace with your REGION"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "os.environ[\"PROJECT\"] = PROJECT\n",
    "os.environ[\"BUCKET\"] = BUCKET\n",
    "os.environ[\"REGION\"] = REGION\n",
    "os.environ[\"TFVERSION\"] = \"2.1\"\n",
    "os.environ[\"PYTHONVERSION\"] = \"3.7\""
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "%%bash\n",
    "gcloud config set project $PROJECT\n",
    "gcloud config set compute/region $REGION"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "%%bash\n",
    "if ! gcloud storage ls | grep -q gs://${BUCKET}; then\n",    "    gcloud storage buckets create --location ${REGION} gs://${BUCKET}\n",    "fi"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Check data exists\n",
    "\n",
    "Verify that you previously created CSV files we'll be using for training and evaluation. If not, go back to lab [1b_prepare_data_babyweight.ipynb](../solutions/1b_prepare_data_babyweight.ipynb) to create them."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "%%bash\n",
    "gcloud storage ls gs://${BUCKET}/babyweight/data/*000000000000.csv"   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Now that we have the [Keras wide-and-deep code](../solutions/4c_keras_wide_and_deep_babyweight.ipynb) working on a subset of the data, we can package the TensorFlow code up as a Python module and train it on Cloud AI Platform.\n",
    "\n",
    "## Train on Cloud AI Platform\n",
    "\n",
    "Training on Cloud AI Platform requires:\n",
    "* Making the code a Python package\n",
    "* Using gcloud to submit the training code to [Cloud AI Platform](https://console.cloud.google.com/ai-platform)\n",
    "\n",
    "**Ensure that the Cloud AI Platform API is enabled by going to this [link](https://console.developers.google.com/apis/library/ml.googleapis.com).**\n",
    "\n",
    "### Move code into a Python package\n",
    "\n",
    "A Python package is simply a collection of one or more `.py` files along with an `__init__.py` file to identify the containing directory as a package. The `__init__.py` sometimes contains initialization code but for our purposes an empty file suffices.\n",
    "\n",
    "The bash command `touch` creates an empty file in the specified location, the directory `babyweight` should already exist."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "%%bash\n",
    "mkdir -p babyweight/trainer\n",
    "touch babyweight/trainer/__init__.py"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "We then use the `%%writefile` magic to write the contents of the cell below to a file called `task.py` in the `babyweight/trainer` folder."
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### Lab Task #2: Create trainer module's task.py to hold hyperparameter argparsing code.\n",
    "\n",
    "The cell below writes the file `babyweight/trainer/task.py` which sets up our training job. Here is where we determine which parameters of our model to pass as flags during training using the `parser` module. Look at how `batch_size` is passed to the model in the code below. Use this as an example to parse arguements for the following variables\n",
    "- `nnsize` which represents the hidden layer sizes to use for DNN feature columns\n",
    "- `nembeds` which represents the embedding size of a cross of n key real-valued parameters\n",
    "- `train_examples` which represents the number of examples (in thousands) to run the training job\n",
    "- `eval_steps` which represents the positive number of steps for which to evaluate model\n",
    "\n",
    "Be sure to include a default value for the parsed arguments above and specfy the `type` if necessary."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "%%writefile babyweight/trainer/task.py\n",
    "import argparse\n",
    "import json\n",
    "import os\n",
    "\n",
    "from trainer import model\n",
    "\n",
    "import tensorflow as tf\n",
    "\n",
    "if __name__ == \"__main__\":\n",
    "    parser = argparse.ArgumentParser()\n",
    "    parser.add_argument(\n",
    "        \"--job-dir\",\n",
    "        help=\"this model ignores this field, but it is required by gcloud\",\n",
    "        default=\"junk\"\n",
    "    )\n",
    "    parser.add_argument(\n",
    "        \"--train_data_path\",\n",
    "        help=\"GCS location of training data\",\n",
    "        required=True\n",
    "    )\n",
    "    parser.add_argument(\n",
    "        \"--eval_data_path\",\n",
    "        help=\"GCS location of evaluation data\",\n",
    "        required=True\n",
    "    )\n",
    "    parser.add_argument(\n",
    "        \"--output_dir\",\n",
    "        help=\"GCS location to write checkpoints and export models\",\n",
    "        required=True\n",
    "    )\n",
    "    parser.add_argument(\n",
    "        \"--batch_size\",\n",
    "        help=\"Number of examples to compute gradient over.\",\n",
    "        type=int,\n",
    "        default=512\n",
    "    )\n",
    "\n",
    "    # TODO: Add nnsize argument\n",
    "\n",
    "    # TODO: Add nembeds argument\n",
    "\n",
    "    # TODO: Add num_epochs argument\n",
    "\n",
    "    # TODO: Add train_examples argument\n",
    "\n",
    "    # TODO: Add eval_steps argument\n",
    "\n",
    "    # Parse all arguments\n",
    "    args = parser.parse_args()\n",
    "    arguments = args.__dict__\n",
    "\n",
    "    # Unused args provided by service\n",
    "    arguments.pop(\"job_dir\", None)\n",
    "    arguments.pop(\"job-dir\", None)\n",
    "\n",
    "    # Modify some arguments\n",
    "    arguments[\"train_examples\"] *= 1000\n",
    "\n",
    "    # Append trial_id to path if we are doing hptuning\n",
    "    # This code can be removed if you are not using hyperparameter tuning\n",
    "    arguments[\"output_dir\"] = os.path.join(\n",
    "        arguments[\"output_dir\"],\n",
    "        json.loads(\n",
    "            os.environ.get(\"TF_CONFIG\", \"{}\")\n",
    "        ).get(\"task\", {}).get(\"trial\", \"\")\n",
    "    )\n",
    "\n",
    "    # Run the training job\n",
    "    model.train_and_evaluate(arguments)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "In the same way we can write to the file `model.py` the model that we developed in the previous notebooks. \n",
    "\n",
    "### Lab Task #3: Create trainer module's model.py to hold Keras model code.\n",
    "\n",
    "Complete the TODOs in the code cell below to create our `model.py`. We'll use the code we wrote for the Wide & Deep model. Look back at your [4c_keras_wide_and_deep_babyweight.ipynb](../solutions/4c_keras_wide_and_deep_babyweight.ipynb) notebook and copy/paste the necessary code from that notebook into its place in the cell below."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "%%writefile babyweight/trainer/model.py\n",
    "import datetime\n",
    "import os\n",
    "import shutil\n",
    "import numpy as np\n",
    "import tensorflow as tf\n",
    "\n",
    "# Determine CSV, label, and key columns\n",
    "# TODO: Add CSV_COLUMNS and LABEL_COLUMN\n",
    "\n",
    "# Set default values for each CSV column.\n",
    "# Treat is_male and plurality as strings.\n",
    "# TODO: Add DEFAULTS\n",
    "\n",
    "\n",
    "def features_and_labels(row_data):\n",
    "    # TODO: Add your code here\n",
    "    pass\n",
    "\n",
    "\n",
    "def load_dataset(pattern, batch_size=1, mode='eval'):\n",
    "    # TODO: Add your code here\n",
    "    pass\n",
    "\n",
    "\n",
    "def create_input_layers():\n",
    "    # TODO: Add your code here\n",
    "    pass\n",
    "\n",
    "\n",
    "def categorical_fc(name, values):\n",
    "    # TODO: Add your code here\n",
    "    pass\n",
    "\n",
    "\n",
    "def create_feature_columns(nembeds):\n",
    "    # TODO: Add your code here\n",
    "    pass\n",
    "\n",
    "\n",
    "def get_model_outputs(wide_inputs, deep_inputs, dnn_hidden_units):\n",
    "    # TODO: Add your code here\n",
    "    pass\n",
    "\n",
    "\n",
    "def rmse(y_true, y_pred):\n",
    "    # TODO: Add your code here\n",
    "    pass\n",
    "\n",
    "\n",
    "def build_wide_deep_model(dnn_hidden_units=[64, 32], nembeds=3):\n",
    "    # TODO: Add your code here\n",
    "    pass\n",
    "\n",
    "\n",
    "def train_and_evaluate(args):\n",
    "    model = build_wide_deep_model(args[\"nnsize\"], args[\"nembeds\"])\n",
    "    print(\"Here is our Wide-and-Deep architecture so far:\\n\")\n",
    "    print(model.summary())\n",
    "\n",
    "    trainds = load_dataset(\n",
    "        args[\"train_data_path\"],\n",
    "        args[\"batch_size\"],\n",
    "        'train')\n",
    "\n",
    "    evalds = load_dataset(\n",
    "        args[\"eval_data_path\"], 1000, 'eval')\n",
    "    if args[\"eval_steps\"]:\n",
    "        evalds = evalds.take(count=args[\"eval_steps\"])\n",
    "\n",
    "    num_batches = args[\"batch_size\"] * args[\"num_epochs\"]\n",
    "    steps_per_epoch = args[\"train_examples\"] // num_batches\n",
    "\n",
    "    checkpoint_path = os.path.join(args[\"output_dir\"], \"checkpoints/babyweight\")\n",
    "    cp_callback = tf.keras.callbacks.ModelCheckpoint(\n",
    "        filepath=checkpoint_path, verbose=1, save_weights_only=True)\n",
    "\n",
    "    history = model.fit(\n",
    "        trainds,\n",
    "        validation_data=evalds,\n",
    "        epochs=args[\"num_epochs\"],\n",
    "        steps_per_epoch=steps_per_epoch,\n",
    "        verbose=2,  # 0=silent, 1=progress bar, 2=one line per epoch\n",
    "        callbacks=[cp_callback])\n",
    "\n",
    "    EXPORT_PATH = os.path.join(\n",
    "        args[\"output_dir\"], datetime.datetime.now().strftime(\"%Y%m%d%H%M%S\"))\n",
    "    tf.saved_model.save(\n",
    "        obj=model, export_dir=EXPORT_PATH)  # with default serving function\n",
    "    print(\"Exported trained model to {}\".format(EXPORT_PATH))"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Train locally\n",
    "\n",
    "After moving the code to a package, make sure it works as a standalone. Note, we incorporated the `--train_examples` flag so that we don't try to train on the entire dataset while we are developing our pipeline. Once we are sure that everything is working on a subset, we can change it so that we can train on all the data. Even for this subset, this takes about *3 minutes* in which you won't see any output ..."
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### Lab Task #4: Run trainer module package locally.\n",
    "\n",
    "Fill in the missing code in the TODOs below so that we can run a very small training job over a single file with a small batch size, 1 epoch, 1 train example, and 1 eval step."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "%%bash\n",
    "OUTDIR=babyweight_trained\n",
    "rm -rf ${OUTDIR}\n",
    "export PYTHONPATH=${PYTHONPATH}:${PWD}/babyweight\n",
    "python3 -m trainer.task \\\n",
    "    --job-dir=./tmp \\\n",
    "    --train_data_path=gs://${BUCKET}/babyweight/data/train*.csv \\\n",
    "    --eval_data_path=gs://${BUCKET}/babyweight/data/eval*.csv \\\n",
    "    --output_dir=${OUTDIR} \\\n",
    "    --batch_size=# TODO: Add batch size\n",
    "    --num_epochs=# TODO: Add the number of epochs to train for\n",
    "    --train_examples=# TODO: Add the number of examples to train each epoch for\n",
    "    --eval_steps=# TODO: Add the number of evaluation batches to run"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Training on Cloud AI Platform\n",
    "\n",
    "Now that we see everything is working locally, it's time to train on the cloud! "
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "To submit to the Cloud we use [`gcloud ai-platform jobs submit training [jobname]`](https://cloud.google.com/sdk/gcloud/reference/ml-engine/jobs/submit/training) and simply specify some additional parameters for AI Platform Training Service:\n",
    "- jobname: A unique identifier for the Cloud job. We usually append system time to ensure uniqueness\n",
    "- job-dir: A GCS location to upload the Python package to\n",
    "- runtime-version: Version of TF to use.\n",
    "- python-version: Version of Python to use. Currently only Python 3.7 is supported for TF 2.1.\n",
    "- region: Cloud region to train in. See [here](https://cloud.google.com/ml-engine/docs/tensorflow/regions) for supported AI Platform Training Service regions\n",
    "\n",
    "Below the `-- \\` we add in the arguments for our `task.py` file."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "%%bash\n",
    "\n",
    "OUTDIR=gs://${BUCKET}/babyweight/trained_model\n",
    "JOBID=babyweight_$(date -u +%y%m%d_%H%M%S)\n",
    "\n",
    "gcloud ai-platform jobs submit training ${JOBID} \\\n",
    "    --region=${REGION} \\\n",
    "    --module-name=trainer.task \\\n",
    "    --package-path=$(pwd)/babyweight/trainer \\\n",
    "    --job-dir=${OUTDIR} \\\n",
    "    --staging-bucket=gs://${BUCKET} \\\n",
    "    --master-machine-type=n1-standard-8 \\\n",
    "    --scale-tier=CUSTOM \\\n",
    "    --runtime-version=${TFVERSION} \\\n",
    "    --python-version=${PYTHONVERSION} \\\n",
    "    -- \\\n",
    "    --train_data_path=gs://${BUCKET}/babyweight/data/train*.csv \\\n",
    "    --eval_data_path=gs://${BUCKET}/babyweight/data/eval*.csv \\\n",
    "    --output_dir=${OUTDIR} \\\n",
    "    --num_epochs=10 \\\n",
    "    --train_examples=10000 \\\n",
    "    --eval_steps=100 \\\n",
    "    --batch_size=32 \\\n",
    "    --nembeds=8"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "The training job should complete within 10 to 15 minutes. You do not need to wait for this training job to finish before moving forward in the notebook, but will need a trained model to complete our next lab."
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Dockerized module\n",
    "\n",
    "Since we are using TensorFlow 2.3 and it is new, we will use a container image to run the code on AI Platform.\n",
    "\n",
    "Once TensorFlow 2.3 is natively supported on AI Platform, you will be able to simply do (without having to build a container):\n",
    "<pre>\n",
    "gcloud ai-platform jobs submit training ${JOBNAME} \\\n",
    "    --region=${REGION} \\\n",
    "    --module-name=trainer.task \\\n",
    "    --package-path=$(pwd)/babyweight/trainer \\\n",
    "    --job-dir=${OUTDIR} \\\n",
    "    --staging-bucket=gs://${BUCKET} \\\n",
    "    --scale-tier=STANDARD_1 \\\n",
    "    --runtime-version=${TFVERSION} \\\n",
    "    -- \\\n",
    "    --train_data_path=gs://${BUCKET}/babyweight/data/train*.csv \\\n",
    "    --eval_data_path=gs://${BUCKET}/babyweight/data/eval*.csv \\\n",
    "    --output_dir=${OUTDIR} \\\n",
    "    --num_epochs=10 \\\n",
    "    --train_examples=20000 \\\n",
    "    --eval_steps=100 \\\n",
    "    --batch_size=32 \\\n",
    "    --nembeds=8\n",
    "</pre>"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### Create Dockerfile\n",
    "\n",
    "We need to create a container with everything we need to be able to run our model. This includes our trainer module package, python3, as well as the libraries we use such as the most up to date TensorFlow 2.0 version."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "%%writefile babyweight/Dockerfile\n",
    "FROM gcr.io/deeplearning-platform-release/tf2-cpu\n",
    "COPY trainer /babyweight/trainer\n",
    "RUN apt update && \\\n",
    "    apt install --yes python3-pip && \\\n",
    "    pip3 install --upgrade --quiet tensorflow==2.1 && \\\n",
    "    pip3 install --upgrade --quiet cloudml-hypertune\n",
    "\n",
    "ENV PYTHONPATH ${PYTHONPATH}:/babyweight\n",
    "ENTRYPOINT [\"python3\", \"babyweight/trainer/task.py\"]"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### Build and push container image to repo\n",
    "\n",
    "Now that we have created our Dockerfile, we need to build and push our container image to our project's container repo. To do this, we'll create a small shell script that we can call from the bash."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "%%writefile babyweight/push_docker.sh\n",
    "export PROJECT_ID=$(gcloud config list project --format \"value(core.project)\")\n",
    "export IMAGE_REPO_NAME=babyweight_training_container\n",
    "export IMAGE_URI=gcr.io/${PROJECT_ID}/${IMAGE_REPO_NAME}\n",
    "\n",
    "echo \"Building  $IMAGE_URI\"\n",
    "docker build -f Dockerfile -t ${IMAGE_URI} ./\n",
    "echo \"Pushing $IMAGE_URI\"\n",
    "docker push ${IMAGE_URI}"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "**Note:** If you get a permissions/stat error when running push_docker.sh from Notebooks, do it from CloudShell:\n",
    "\n",
    "Open CloudShell on the GCP Console\n",
    "* git clone https://github.com/GoogleCloudPlatform/training-data-analyst\n",
    "* cd training-data-analyst/courses/machine_learning/deepdive2/structured/solutions/babyweight\n",
    "* bash push_docker.sh\n",
    "\n",
    "This step takes 5-10 minutes to run."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "%%bash\n",
    "cd babyweight\n",
    "bash push_docker.sh"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Kindly ignore the incompatibility errors."
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### Test container locally\n",
    "\n",
    "Before we submit our training job to Cloud AI Platform, let's make sure our container that we just built and pushed to our project's container repo works perfectly. We can do that by calling our container in bash and passing the necessary user_args for our task.py's parser."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "%%bash\n",
    "export PROJECT_ID=$(gcloud config list project --format \"value(core.project)\")\n",
    "export IMAGE_REPO_NAME=babyweight_training_container\n",
    "export IMAGE_URI=gcr.io/${PROJECT_ID}/${IMAGE_REPO_NAME}\n",
    "echo \"Running  $IMAGE_URI\"\n",
    "docker run ${IMAGE_URI} \\\n",
    "    --train_data_path=gs://${BUCKET}/babyweight/data/train*.csv \\\n",
    "    --train_data_path=gs://${BUCKET}/babyweight/data/train*.csv \\\n",
    "    --eval_data_path=gs://${BUCKET}/babyweight/data/eval*.csv \\\n",
    "    --output_dir=gs://${BUCKET}/babyweight/trained_model \\\n",
    "    --batch_size=10 \\\n",
    "    --num_epochs=10 \\\n",
    "    --train_examples=1 \\\n",
    "    --eval_steps=1"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Lab Task #5: Train on Cloud AI Platform.\n",
    "\n",
    "Once the code works in standalone mode, you can run it on Cloud AI Platform. Because this is on the entire dataset, it will take a while. The training run took about <b> two hours </b> for me. You can monitor the job from the GCP console in the Cloud AI Platform section. Complete the __#TODO__s to make sure you have the necessary user_args for our task.py's parser."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "%%bash\n",
    "OUTDIR=gs://${BUCKET}/babyweight/trained_model\n",
    "JOBID=babyweight_$(date -u +%y%m%d_%H%M%S)\n",
    "echo ${OUTDIR} ${REGION} ${JOBNAME}\n",
    "# gcloud storage rm --recursive --continue-on-error ${OUTDIR}\n",    "\n",
    "IMAGE=gcr.io/${PROJECT}/babyweight_training_container\n",
    "\n",
    "gcloud ai-platform jobs submit training ${JOBID} \\\n",
    "    --staging-bucket=gs://${BUCKET} \\\n",
    "    --region=${REGION} \\\n",
    "    --master-image-uri=${IMAGE} \\\n",
    "    --master-machine-type=n1-standard-4 \\\n",
    "    --scale-tier=CUSTOM \\\n",
    "    -- \\\n",
    "    --train_data_path=# TODO: Add path to training data in GCS\n",
    "    --eval_data_path=# TODO: Add path to evaluation data in GCS\n",
    "    --output_dir=${OUTDIR} \\\n",
    "    --num_epochs=# TODO: Add the number of epochs to train for\n",
    "    --train_examples=# TODO: Add the number of examples to train each epoch for\n",
    "    --eval_steps=# TODO: Add the number of evaluation batches to run\n",
    "    --batch_size=# TODO: Add batch size\n",
    "    --nembeds=# TODO: Add number of embedding dimensions"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "When I ran it, I used train_examples=2000000. When training finished, I filtered in the Stackdriver log on the word \"dict\" and saw that the last line was:\n",
    "<pre>\n",
    "Saving dict for global step 5714290: average_loss = 1.06473, global_step = 5714290, loss = 34882.4, rmse = 1.03186\n",
    "</pre>\n",
    "The final RMSE was 1.03 pounds."
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Lab Task #6: Hyperparameter tuning.\n",
    "\n",
    "All of these are command-line parameters to my program.  To do hyperparameter tuning, create `hyperparam.yaml` and pass it as `--config hyperparam.yaml`.\n",
    "This step will take <b>up to 2 hours</b> -- you can increase `maxParallelTrials` or reduce `maxTrials` to get it done faster.  Since `maxParallelTrials` is the number of initial seeds to start searching from, you don't want it to be too large; otherwise, all you have is a random search. Complete __#TODO__s in yaml file and gcloud training job bash command so that we can run hyperparameter tuning."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "%%writefile hyperparam.yaml\n",
    "trainingInput:\n",
    "    scaleTier: STANDARD_1\n",
    "    hyperparameters:\n",
    "        hyperparameterMetricTag: # TODO: Add metric we want to optimize\n",
    "        goal: # TODO: MAXIMIZE or MINIMIZE?\n",
    "        maxTrials: 20\n",
    "        maxParallelTrials: 5\n",
    "        enableTrialEarlyStopping: True\n",
    "        params:\n",
    "        - parameterName: batch_size\n",
    "          type: # TODO: What datatype?\n",
    "          minValue: # TODO: Choose a min value\n",
    "          maxValue: # TODO: Choose a max value\n",
    "          scaleType: # TODO: UNIT_LINEAR_SCALE or UNIT_LOG_SCALE?\n",
    "        - parameterName: nembeds\n",
    "          type: # TODO: What datatype?\n",
    "          minValue: # TODO: Choose a min value\n",
    "          maxValue: # TODO: Choose a max value\n",
    "          scaleType: # TODO: UNIT_LINEAR_SCALE or UNIT_LOG_SCALE?"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "%%bash\n",
    "OUTDIR=gs://${BUCKET}/babyweight/hyperparam\n",
    "JOBNAME=babyweight_$(date -u +%y%m%d_%H%M%S)\n",
    "echo ${OUTDIR} ${REGION} ${JOBNAME}\n",
    "gcloud storage rm --recursive --continue-on-error ${OUTDIR}\n",    "\n",
    "IMAGE=gcr.io/${PROJECT}/babyweight_training_container\n",
    "\n",
    "gcloud ai-platform jobs submit training ${JOBNAME} \\\n",
    "    --staging-bucket=gs://${BUCKET} \\\n",
    "    --region=${REGION} \\\n",
    "    --master-image-uri=${IMAGE} \\\n",
    "    --master-machine-type=n1-standard-4 \\\n",
    "    --scale-tier=CUSTOM \\\n",
    "    --# TODO: Add config for hyperparam.yaml\n",
    "    -- \\\n",
    "    --train_data_path=gs://${BUCKET}/babyweight/data/train*.csv \\\n",
    "    --eval_data_path=gs://${BUCKET}/babyweight/data/eval*.csv \\\n",
    "    --output_dir=${OUTDIR} \\\n",
    "    --num_epochs=10 \\\n",
    "    --train_examples=20000 \\\n",
    "    --eval_steps=100"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Repeat training\n",
    "\n",
    "This time with tuned parameters for `batch_size` and `nembeds`."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "%%bash\n",
    "OUTDIR=gs://${BUCKET}/babyweight/trained_model_tuned\n",
    "JOBNAME=babyweight_$(date -u +%y%m%d_%H%M%S)\n",
    "echo ${OUTDIR} ${REGION} ${JOBNAME}\n",
    "gcloud storage rm --recursive --continue-on-error ${OUTDIR}\n",    "\n",
    "IMAGE=gcr.io/${PROJECT}/babyweight_training_container\n",
    "\n",
    "gcloud ai-platform jobs submit training ${JOBNAME} \\\n",
    "    --staging-bucket=gs://${BUCKET} \\\n",
    "    --region=${REGION} \\\n",
    "    --master-image-uri=${IMAGE} \\\n",
    "    --master-machine-type=n1-standard-4 \\\n",
    "    --scale-tier=CUSTOM \\\n",
    "    -- \\\n",
    "    --train_data_path=gs://${BUCKET}/babyweight/data/train*.csv \\\n",
    "    --eval_data_path=gs://${BUCKET}/babyweight/data/eval*.csv \\\n",
    "    --output_dir=${OUTDIR} \\\n",
    "    --num_epochs=10 \\\n",
    "    --train_examples=20000 \\\n",
    "    --eval_steps=100 \\\n",
    "    --batch_size=32 \\\n",
    "    --nembeds=8"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Lab Summary: \n",
    "In this lab, we set up the environment, created the trainer module's task.py to hold hyperparameter argparsing code, created the trainer module's model.py to hold Keras model code, ran the trainer module package locally, submitted a training job to Cloud AI Platform, and submitted a hyperparameter tuning job to Cloud AI Platform."
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Copyright 2019 Google Inc. Licensed under the Apache License, Version 2.0 (the \"License\"); you may not use this file except in compliance with the License. You may obtain a copy of the License at http://www.apache.org/licenses/LICENSE-2.0 Unless required by applicable law or agreed to in writing, software distributed under the License is distributed on an \"AS IS\" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the License for the specific language governing permissions and limitations under the License"
   ]
  }
 ],
 "metadata": {
  "kernelspec": {
   "display_name": "Python 3",
   "language": "python",
   "name": "python3"
  },
  "language_info": {
   "codemirror_mode": {
    "name": "ipython",
    "version": 3
   },
   "file_extension": ".py",
   "mimetype": "text/x-python",
   "name": "python",
   "nbconvert_exporter": "python",
   "pygments_lexer": "ipython3",
   "version": "3.6.8"
  }
 },
 "nbformat": 4,
 "nbformat_minor": 4
}
